Machine Translation - 09: Monolingual Data
نویسنده
چکیده
4.2.1 Balancing the LM and TM In order for the decoder to flexibly balance the input from the LM and TM, we augment the decoder with a “controller” mechanism. The need to flexibly balance the signals arises depending on the work being translated. For instance, in the case of Zh-En, there are no Chinese words that correspond to articles in English, in which case the LM may be more informative. On the other hand, if a noun is to be translated, it may be better to ignore any signal from the LM, as it may prevent the decoder from choosing the correct translation. Intuitively, this mechanism helps the model dynamically weight the different models depending on the word being translated. The controller mechanism is implemented as a function taking the hidden state of the LM as input and computing
منابع مشابه
Improving Neural Machine Translation Models with Monolingual Data
Neural Machine Translation (NMT) has obtained state-of-the art performance for several language pairs, while only using parallel data for training. Monolingual data plays an important role in boosting fluency for phrase-based statistical machine translation, and we investigate the use of monolingual data for neural machine translation (NMT). In contrast to previous work, which integrates a sepa...
متن کاملEnabling Monolingual Translators: Post-Editing vs. Options
We carried out a study on monolingual translators with no knowledge of the source language, but aided by post-editing and the display of translation options. On Arabic-English and Chinese-English, using standard test data and current statistical machine translation systems, 10 monolingual translators were able to translate 35% of Arabic and 28% of Chinese sentences correctly on average, with so...
متن کاملForms Wanted: Training SMT on Monolingual Data
We propose and evaluate a simple technique of “reverse self-training” for statistical machine translation. The technique allows to extend target-side vocabulary of the MT system using target-side monolingual data and it is especially aimed at translation to morphologically rich languages.
متن کاملExploiting Source-side Monolingual Data in Neural Machine Translation
Neural Machine Translation (NMT) based on the encoder-decoder architecture has recently become a new paradigm. Researchers have proven that the target-side monolingual data can greatly enhance the decoder model of NMT. However, the source-side monolingual data is not fully explored although it should be useful to strengthen the encoder model of NMT, especially when the parallel corpus is far fr...
متن کاملLanguage and Translation Model Adaptation using Comparable Corpora
Traditionally, statistical machine translation systems have relied on parallel bi-lingual data to train a translation model. While bi-lingual parallel data are expensive to generate, monolingual data are relatively common. Yet monolingual data have been under-utilized, having been used primarily for training a language model in the target language. This paper describes a novel method for utiliz...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018